Universal Access To All Knowledge
Home Donate | Store | Blog | FAQ | Jobs | Volunteer Positions | Contact | Bios | Forums | Projects | Terms, Privacy, & Copyright
Search: Advanced Search
Anonymous User (login or join us)
Upload

Reply to this post | Go Back
View Post [edit]

Poster: blackheart Date: May 9, 2009 10:17pm
Forum: web Subject: robots.txt misuse?

There used to be a company named "Piezo Crystal Company". They made crystal oscillators. Their web site was www.piezo-crystal.com. That company no longer exists, but the domain name has been bought by a cyber-squatter who owns 985 other domains. His web site includes a robots.txt file that has caused IA to purge the historical pages of the former company. Is that the way it's supposed to work? Doesn't it defeat the whole purpose of the IA to delete historical data because a domain name has changed hands?

I looked through the info on robots.txt, but didn't see anything that covered this situation.

Reply to this post
Reply [edit]

Poster: kustota Date: May 10, 2009 1:23am
Forum: web Subject: Re: robots.txt misuse?

information had not been 'deleted', it only stopped being 'public'. as i understand, as soon as robots.txt exclusion is removed, all archived data becomes public again.

Reply to this post
Reply [edit]

Poster: blackheart Date: May 10, 2009 10:04am
Forum: web Subject: Re: robots.txt misuse?

The info on robots.txt says that the info would "remove all documents from your domain from the Wayback Machine". I took that to mean that it would be deleted.

In any case, the main point is that the person putting up the robots.txt file has no right to deny access to the old information. Ownership of the domain name doesn't necessarily mean ownership of the old information.

Reply to this post
Reply [edit]

Poster: Face_ Date: May 10, 2009 10:30am
Forum: web Subject: Re: robots.txt misuse?

True indeed. You could e-mail the people at the Internet Archive (info@archive.org) and ask them if they can put that robots.txt file on some kind of blacklist, so that the original data can become visible again. I wonder if they have such a blacklist...

Reply to this post
Reply [edit]

Poster: innocented Date: Sep 18, 2009 3:29pm
Forum: web Subject: Re: robots.txt misuse?

no word about deleting.
what do you think would be the bulletproof way for WM archivists to determine, whether robots.txt exclusion is placed by domain squatter or by 'info owner'? maybe they have to check every excluded website themselves? detial edesoft.net cool hosting

Reply to this post
Reply [edit]

Poster: kustota Date: May 10, 2009 5:52pm
Forum: web Subject: Re: robots.txt misuse?

http://www2.sims.berkeley.edu/research/conferences/aps/removal-policy.html
'To comply with such requests, archivists may restrict access' to archived info. no word about deleting.
what do you think would be the bulletproof way for WM archivists to determine, whether robots.txt exclusion is placed by domain squatter or by 'info owner'? maybe they have to check every excluded website themselves?